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Attorney Docket: 10559/293001/P93 00 
ARRAY SEARCHING OPERATIONS 

BACKGROUND 

This invention relates to array searching operations 
5 for a computer. 

Many conventional programmable processors, such as 
digital signal processors (DSP) , support a rich instruction 
set that includes numerous instructions for manipulating 
arrays of data. These operations are typically 
10 computationally intensive and can require significant 
computing time, depending upon the number of execution 
units, such as multiply-accumulate units (MACs) , within the 
processor . 

15 DESCRIPTION OF DRAWINGS 

Figure 1 is a block diagram illustrating an example of 
a pipelined programmable processor according to the 
invention. 

Figure 2 is a block diagram illustrating an example 
20 execution pipeline for the programmable processor. 

Figure 3 is a flowchart for implementing an example 
array manipulation machine instruction according to the 
invention. 

Figure 4 is a flowchart of an example routine for 
25 invoking the machine instruction. 
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DESCRIPTION 
Figure 1 is a block diagram illustrating a 
programmable processor 2 having an execution pipeline 4 and 
a control unit 6. Processor 2, as explained in detail 
5 below, reduces the computational time required by array 
manipulation operations. In particular, processor 2 may 
support a machine instruction, referred to herein as the 
SEARCH instruction, that reduces the computational time to 
search an array of numbers in a pipelined processing 
1 0 envi ronment . 

Pipeline 4 has a number of stages for processing 
instructions. Each stage processes concurrently with the 
other stages and passes results to the next stage in 
pipeline 4 at each clock cycle. The final results of each 
15 instruction emerge at the end of the pipeline in rapid 
succession. 

Control unit 6 controls the flow of instructions and 
data through the various stages of pipeline 4. During the 
processing of an instruction, for example, control unit 6 
20 directs the various components of the pipelined to fetch 
and decode the instruction, perform the corresponding 
operation and write the results back to memory or local 
registers . 

Figure 2 illustrates an example pipeline 4 configured 
25 according to the invention. Pipeline 4, for example, has 

five stages: instruction fetch (IF), decode (DEC), address 
calculation (AC) , execute (EX) and write back (WB) . 
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Instructions are fetched from memory, or from an 
instruction cache, during the IF stage by fetch unit 21 and 
decoded within address registers 22 during the DEC stage. 
At the next clock cycle, the results pass to the AC stage, 
5 where data address generators 23 calculate any memory 
addresses that are necessary to perform the operation. 

During the EX stage, execution units 25A through 25M 
perform the specified operation such as, for example, 
adding or multiplying numbers, in parallel. Execution 

10 units 25 may contain specialized hardware for performing 
the operations including, for example, one or more 
arithmetic logic units (ALU's), floating-point units (FPU) 
and barrel shifters. A variety of data can be applied to 
execution units 25 such as the addresses generated by data 

15 address generator 23, data retrieved from data memory 18 or 
data retrieved from data registers 24. During the final 
stage (WB) , the results are written back to data memory or 
to data registers 24. 

The SEARCH instruction supported by processor 2, may 

20 allow software applications to search an array of N data 

elements by issuing N/M search instructions, where M is the 
number of data elements that can be processed in parallel 
by execution units 25 of pipeline 4. Note, however, that a 
single execution unit may be capable of executing two or 

25 more operations in parallel. For example, an execution 
unit may include a 32 -bit ALU capable of concurrently 
comparing two 16-bit numbers. 
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Generally, the sequence of SEARCH instructions allow 
the processor to process M sets of elements in parallel to 
identify an "extreme value 1 ', such as a maximum or a 
minimum, for each set. During the execution of the search 
5 instructions, processor 2 stores references to the location 
of the extreme value of each of the M sets of elements. 
Upon completion of the N/M instructions, as described in 
detail below, the software application analyzes the 
references to the extreme values for each set to quickly 

10 identify an extreme value for the array. For example, the 
instruction allows the software applications to quickly 
identify either the first or last occurrence of a maximum 
or minimum value. Furthermore, as explained in detail 
below, processor 2 implements the operation in a fashion 

15 suitable for vectorizing in a pipelined processor across 
the M execution units 25. 

As described above, a software application searches an 
array of data by issuing N/M SEARCH machine instructions to 
processor 2. Figure 3 is a flowchart illustrating an 

20 example mode of operation 2 0 for processor 2 when it 

receives a single SEARCH machine instruction. Process 20 
is described with reference to identifying the last 
occurrence of a minimum value within the array of elements; 
however, process 2 0 can be easily modified to perform other 

25 functions such as identifying the first occurrence of a 

minimum value, the first occurrence of a maximum value or a 
last occurrence of a maximum value. 
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For exemplary purposes, process 2 0 is described in 
assuming M equals 2, i.e., processor 2 concurrently 
processes two sets of elements, each set having N/2 
elements. However, the process is not limited as such and 
5 is readily extensible to concurrently process more than two 
sets of elements. In general, process 2 0 facilitates 
vectorization of the search process by fetching pairs of 
elements as a single data quantity and processing the 
element pairs through pipeline 4 in parallel, thereby 

10 reducing the total number of clock cycles necessary to 
identify the minimum value within the array. Although 
applicable to other architectures, process 20 is well 
suited for a pipelined processor 2 having multiple 
execution units in the EX stage. For each set of elements, 

15 process 20 maintains two pointer registers, P Even and P odd , 
that store locations for the current extreme value within 
the corresponding set. In addition, process 20 maintains 
two accumulators, AO and Al,that hold the current extreme 
values for the sets. The pointer registers and the 

20 accumulators, however, may readily be implemented as 
general -purpose data registers without departing from 
process 30. 

Referring to Figure 3, in response to each SEARCH 
instruction, processor 2 fetches a pair of elements in one 
25 clock cycle as a single data quantity (21) . For example, 

processor 2 may fetch two adjacent 16 -bit values as one 32- 
bit quantity. Next, processor 2 compares the even element 
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of the pair to a current minimum value for the even 
elements (22) and the odd element of the pair to a current 
minimum value for the odd elements (24) . 

When a new minimum value for the even elements is 
5 detected, processor 2 updates accumulator AO to hold the 
new minimum value and updates a pointer register P Even to 
hold a pointer to corresponding data quantity within the 
array (23) . Similarly, when a new minimum value for the 
odd elements has been detected, processor 2 updates 

10 accumulator Al and a pointer register P odd (25) . In this 
example, each pointer register P Even and P odd points to the 
data quantity and not the individual elements, although the 
process is not limited as such. Processor 2 repeats the 
process until all of the elements within the array have 

15 been processed (26) . Because processor 2 is pipelined, 

element pairs may be fetched until the array is processed. 

The following illustrates exemplary syntax for 
invoking the machine instruction: 

(P 0 dd, P.vJ = SEARCH R Data LE, R Data = [P fetC h_addr+ + ] 

20 Data register R Data is used as a scratch register to 

store each newly fetched data element pair, with the least 
significant word of R Data holding the odd element and the 
most significant word of R Data holding the even element. 
Two accumulators, AO and Al, are implicitly used to store 

25 the actual values of the results. An additional register, 
Pfetchaddr/ is incremented when the SEARCH instruction is 
issued and is used as a pointer to iterate over the N/2 
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data quantities within the array. The defined condition, 
such as ""less than or equal M (LE) in the above example, 
controls which comparison is executed and when the pointer 
registers P Even and P odd , as well as the accumulators AO and 
5 Al, are updated. The " v LE !r , for example, directs 

processor 2 to identify the last occurrence of the minimum 
value . 

In a typical application, a programmer develops a 
software application or subroutine that issues the N/M 

10 search instructions, probably from within a loop construct. 
The programmer may write the software application in 
assembly language or in a high-level software language. A 
compiler is typically invoked to processes the high-level 
software application and generate the appropriate machine 

15 instructions for processor 2, including the SEARCH machine 
instructions for searching the array of data. 

Figure 4 is a flowchart of an example software routine 
3 0 for invoking the example machine instructions 
illustrated above. First, the software routine 3 0 

20 initializes the registers including initializing AO and Al 
and pointing P Eve and P odd to the first data quantity within 
the array (31) . In one embodiment, software routine 30 
initializes a loop count register with the number of SEARCH 
instructions to issue (N/M) . Next, routine 3 0 issues the 

25 SEARCH machine instruction N/M times. This can be 

accomplished a number of ways, such as by invoking a 
hardware loop construct supported by processor 2 . Often, 
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however, a compiler may unroll a software loop into a 
sequence of identical SEARCH instructions (32) . 

After issuing N/M search instructions, AO and Al hold 
the last occurrence of the minimum even value and the last 
5 occurrence of the minimum odd value, respectively. 

Furthermore, P Even and P odd store the locations of the two 
data quantities that hold the last occurrence of the 
minimum even value and the last occurrence of the minimum 
odd value . 

10 Next, in order to identify the last occurrence of the 

minimum value for the entire array, routine 3 0 first 
increments P odd by a single element, such that P odd points 
directly at the minimum odd element (33) . Routine 3 0 
compares the accumulators AO and Al to determine whether 

15 the accumulators contain the same value, i.e., whether the 
minimum of the odd elements equals the minimum of the even 
elements (34) . If so, the routine 30 compares the pointers 
to determine whether P odd is less than P Even and, therefore, 
P odd and Pg^ whether the minimum even value occurred earlier 

20 in the array (35) . Based on the comparison, the routine 
determines whether to copy P odd into P Sven (37) . 

When the accumulators AO and Al are not the same, the 
routine compares AO to Al in order to determine which holds 
the minimum value (36) . If Al is less than AO then routine 

25 3 0 sets P^ equal to P odd/ thereby copying the pointer to 
the minimum value from P odd into P Even (37) . 
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At this point, P Even points to the last occurrence of 
the minimum value for the entire array. Next, routine 3 0 
adjusts P Even to compensate for errors introduced to the 
pipelined architecture of processor 2 (38) . For example, 
5 the comparisons described above are typically performed in 
the EX stage of pipeline 4 while incrementing the pointer 
register P fetch addr typically occurs during the AC stage, 
thereby causing the P odd and to be incorrect by a known 

quantity. After adjusting P Even / routine 3 0 returns P Even as 

10 a pointer to the last occurrence of the minimum value 
within the array (39) . 

Figure 5 illustrates the operation for a single SEARCH 
instruction as generalized to the case where processor 2 is 
capable of processing M elements of the array in parallel, 

15 such as when processor 2 includes M execution units. The 
SEARCH instruction causes processor 2 to fetch M elements 
in a single fetch cycle (51) . Furthermore, in this 
example, processor 2 maintains M pointer registers to store 
addresses (locations) of a corresponding extreme value for 

20 each of the M sets of elements. After fetching the M 

elements, processor 2 concurrently compares the M elements 
to a current extreme value for the respective element set, 
as stored in M accumulators (52) . Based on the 
comparisons, processor 2 updates the M accumulators and the 

25 M pointer registers (53) . 

Figure 6 illustrates the general case where a software 
application issues N/M SEARCH instructions and, upon 
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completion of the instructions, determines the extreme 
value for the entire array. First, the software 
application initializes a loop counter, the M accumulators 
used to store the current extreme values for the M element 
5 sets and the M pointers used to store the locations of the 
extreme values (61) . Next, the software application issues 
N/M SEARCH instructions (62) . After completion of the 
instructions, the software application may adjust the M 
pointer registers to correctly reference its respective 

10 extreme value, instead of the data quantity holding the 

extreme value (63) . After adjusting the pointer registers, 
the software application compares the M extreme values for 
the M element sets to identify an extreme value for the 
entire array, i.e., a maximum value or a minimum value 

15 (64) . Then, the software application may use the pointer 

registers to determine whether more than one of the element 
sets have an extreme value equal to the array extreme value 
and, if so, determine which extreme value occurred first, 
or last, depending upon the desired search function (65) . 

20 Various embodiments of the invention have been 

described. For example, a single machine instruction has 
been described that searches an array of data in a manner 
that facilitates vectorization of the search process within 
a pipelined processor. The processor may be implemented in 

25 a variety of systems including general purpose computing 
systems, digital processing systems, laptop computers, 
personal digital assistants (PDA's) and cellular phones. 
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For example, cellular phones often maintain an array of 
values representing signal strength for services available 
360° around the phone. In this context, the process 
discussed above can be readily used upon initialization of 
the cellular phone to scan the available services and 
quickly select the best service. In such a system, the 
processor may be coupled to a memory device, such as a 
FLASH memory device or a static random access memory 
(SRAM), that stores an operating system and other software 
applications. These and other embodiments are within the 
scope of the following claims. 
What is claimed is: 
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1 1. A method comprising: 

2 receiving a machine instruction directing a processor 

3 to search a plurality of data elements; and 

4 executing the machine instruction by: 

5 retrieving M data elements in a single fetch 

6 cycle; 

7 concurrently comparing the M data elements to a 

8 corresponding current extreme value; and 

9 updating a set of references based on the 
10 comparisons. 

1 2. The method of claim 1, wherein retrieving the M data 

2 elements comprises retrieving the M data elements as a 

3 single data quantity containing the M data elements. 

1 3. The method of claim 2, wherein the set of references 

2 comprise pointer registers to store addresses for data 

3 quantities. 

1 4. The method of claim 1, wherein M = 1. 

1 5. The method of claim 1, wherein M = 2. 

1 6. The method of claim 1, wherein executing the machine l 

2 instruction further includes: 

3 storing the current extreme values in M accumulators; 

4 and 

5 copying the M data elements to the accumulators based 

6 on the comparisons. 
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1 7. The method of claim 5, wherein concurrently comparing 

2 the data elements comprises processing a first data element 

3 with a first execution unit of a pipelined processor and 

4 processing a second data element with a second execution 

5 unit of the pipelined processor. 

1 8. The method of claim 5, wherein concurrently comparing 

2 the data elements comprises concurrently processing a first 

3 data element and a second data element within a single 

4 execution unit of a pipelined processor. 

1 9. The method of claim 1, wherein concurrently comparing 

2 each of the data elements to a current extreme value 

3 includes determining whether each of the data elements is 

4 less than the corresponding current extreme value. 

1 10. The method of claim 1, wherein concurrently comparing 

2 each of the data elements to a current extreme value 

3 includes determining whether each of the data elements is 

4 greater than the corresponding current extreme value . 

1 11. A method for searching an array of N data elements for 

2 a value comprising: 

3 issuing N/M machine instructions to a processor, 

4 wherein the processor is adapted to process M data elements 

5 in parallel; and 

6 analyzing results of the machine instructions to 

7 identify a value for the array. 
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1 12. The method of claim 11, further comprising: 

2 executing each machine instruction by: 

3 retrieving M data elements in a single fetch cycle, 

4 concurrently comparing each of the M data elements to 

5 a corresponding current extreme value, and 

6 updating the references based on the comparisons. 

1 13. A method comprising: 

2 retrieving the pair of data elements from an array of 

3 elements in a single fetch operation, wherein the pair of 

4 data elements includes an even data element and an odd data 

5 element; 

6 substantially comparing the even element of the pair 

7 and the odd element of the pair; and 

8 substantially fetching and comparing the remaining 

9 pairs of data elements of the array until all of the data 
10 elements of the array have been processed. 

1 14. The method of claim 13, wherein substantially 

2 comparing the pair of data elements includes setting an 

3 even minimum value as function of the even element of the 

4 element pair and setting an odd minimum value as function 

5 of the odd element of the element pair. 

1 15. The method of claim 13, wherein substantially 

2 comparing the pair of data elements includes maintaining a 

3 first accumulator to store a minimum value for the even 
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4 elements and a second accumulator to store a minimum value 

5 for the odd elements. 

1 16. The method of claim 13, further including maintaining 

2 a first pointer register to store an address for the 

3 minimum value of the even data elements and maintaining a 

4 second pointer register to store an address for the minimum 

5 value of the odd data elements. 

1 17. The method of claim 16, further including adjusting at 

2 least one of the pointer registers after processing all of 

3 the pairs of data elements to account for a number of 

4 stages in the pipeline. 

1 18. The method of claim 13, wherein the method is invoked 

2 by issuing N/M machine instructions to a programmable 

3 processor, wherein N equals the number of elements in the 

4 array and M equals the number of data elements that the 

5 processor can concurrently compare. 

1 19. An apparatus comprising: 

2 a pipeline adapted to process M data elements in 

3 parallel; and 

4 a control unit adapted to direct the execution 

5 pipeline to search an array of N data elements in response 

6 to N/M machine instructions. 

1 20. The apparatus of claim 19, wherein in response to the 

2 machine instructions, the control unit directs the pipeline 

3 to retrieve M data elements from the array of elements in a 
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4 single fetch operation and concurrently compare the data 

5 elements to a corresponding current extreme value. 

1 21. The apparatus of claim 19, wherein the pipeline 

2 includes M registers adapted to store references to the 

3 extreme values. 

1 22. The apparatus of claim 21, wherein the registers are 

2 pointer registers. 

1 23. The apparatus of claim 21, wherein the registers are 

2 general -purpose data registers. 

1 24. The apparatus of claim 18, wherein the pipeline 

2 includes M accumulators to store M current extreme values. 

1 25. The apparatus of claim 18, wherein the pipeline 

2 includes M general -purpose registers to store M current 

3 extreme values. 

1 26. An article comprising a medium having computer- 

2 executable instructions stored thereon for compiling a 

3 software program, wherein the computer-executable 

4 instructions are adapted to generate N/M machine 

5 instructions to search an array of N data elements, each 

6 machine instruction causing a programmable processor to: 

7 retrieve M data elements from an array of N elements 

8 in a single fetch operation; and 

9 sustantially compare each of the M data elements to a 
10 corresponding current extreme value . 
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1 27. The article of claim 26, wherein each machine 

2 instruction causes the processor to update a set of 

3 references based on the comparisons. 

1 28. The article of claim 26, wherein each machine 

2 instruction causes the processor to concurrently process a 

3 first data element and a second data element within a 

4 single execution unit of a pipelined processor. 

1 29. A system comprising: 

2 a memory device; and 

3 a processor coupled to the memory device, wherein the 

4 processor includes a pipeline configured to process M data 

5 elements in parallel and a control unit configured to 

6 direct the pipeline to search an array of N data elements 

7 in response to N/M machine instructions. 

1 30. The system of claim 29, wherein in response to each 

2 machine instructions, the control unit directs the 

3 pipeline to retrieve M data elements from the array of 

4 elements in a single fetch operation and concurrently 

5 compare the data elements to a corresponding current 

6 extreme value. 
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1 31. The system of claim 29, wherein the pipeline includes 

2 M registers configured to store references to the extreme 

3 values. 

1 32. The system of claim 31, wherein the registers are 

2 pointer registers. 

1 33. The system of claim 31, wherein the registers are 

2 general-purpose data registers. 

1 34. The system of claim 29, wherein the memory device 

2 comprises static random access memory. 

1 35. The system of claim 29, wherein the memory device 

2 comprises FLASH memory. 
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ARRAY SEARCHING OPERATIONS 
ABSTRACT 

5 In one embodiment, a programmable processor searches 

an array of N data elements in response to N/M machine 
instructions, where the processor has a pipeline configured 
to process M data elements in parallel. In response to the 
machine instructions, a control unit directs the pipeline 
10 to retrieve M data elements from the array of elements in a 
single fetch cycle, concurrently compare the data elements 
to M current extreme values, and update the current extreme 
values, as well as M references to the current extreme 
values, based on the comparisons. 
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COMBINED DECLARATION AND POWER OF ATTORNEY 

As a below named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below next to my name, 

I believe I am the original, first and sole inventor (if only one name is listed below) or an original, first and joint 
inventor (if plural names are listed below) of the subject matter which is claimed and for which a patent is sought on the 
invention entitled ARRAY SEARCHING OPERATIONS, the specification of which: 

[X] is attached hereto. 

[ ] was filed on _ as Application Serial No. _ and was amended on . 

[ ] was described and claimed in PCT International Application No. filed on 

and as amended under PCT Article 19 on . 



I hereby state that I have reviewed and understand the contents of the above-identified specification, including 
the claims, as amended by any amendment referred to above. 

I acknowledge the duty to disclose all information I know to be material to patentability in accordance with 
Title 37, Code of Federal Regulations, §1.56. 
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States application in the manner provided by the first paragraph of Title 35, United States Code, § 1 12, 1 acknowledge 
the duty to disclose all information I know to be material to patentability as defined in Title 37, Code of Federal 
Regulations, §1 .56(a) which became available between the filing date of the prior application and the national or PCT 
international filing date of this application: 
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for patent or inventor's certificate or of any PCT international application(s) designating at least one country other than 
the United States of America listed below and have also identified below any foreign application for patent or inventor's 
certificate or any PCT international application(s) designating at least one country other than the United States of 
America filed by me on the same subject matter having a filing date before that of the application(s) of which priority is 
claimed: 
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